Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1] Add IS [NOT] TRUE|FALSE|UNKNOWN; make IS NULL and IS MISSING separate AST nodes #1679

Merged
merged 2 commits into from
Dec 16, 2024

Conversation

alancai98
Copy link
Member

Relevant Issues

Description

  • Adds AST, parser, evaluation support for SQL99's IS [NOT] TRUE|FALSE|UNKNOWN boolean test predicate
  • AST: refactors NULL and MISSING out of DataType and creates dedicated null and missing predicate nodes

Other Information

  • Updated Unreleased Section in CHANGELOG: [NO]

    • No v1 yet to be released.
  • Any backward-incompatible changes? [YES]

    • Removal of some DataType AST values and functions (for NULL and MISSING); v1 not yet released so not an issue.
  • Any new external dependencies? [NO]

  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES]

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@alancai98 alancai98 self-assigned this Dec 13, 2024
Copy link

github-actions bot commented Dec 13, 2024

CROSS-ENGINE-REPORT ❌

BASE (LEGACY-V0.14.8) TARGET (EVAL-C08A6E9) +/-
% Passing 89.67% 94.32% 4.65% ✅
Passing 5287 5561 274 ✅
Failing 609 54 -555 ✅
Ignored 0 281 281 🔶
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: v0.14.8
  • Base Engine: LEGACY
  • Target Commit: c08a6e9
  • Target Engine: EVAL

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing/Ignored Tests. ❌
  • Passing in both: 2641
  • Failing in both: 17
  • Ignored in both: 0
  • PASSING in BASE but now FAILING in TARGET: 5
  • PASSING in BASE but now IGNORED in TARGET: 108
  • FAILING in BASE but now PASSING in TARGET: 180
  • IGNORED in BASE but now PASSING in TARGET: 0

Now FAILING Tests ❌

The following 5 test(s) were previously PASSING in BASE but are now FAILING in TARGET:

Click here to see
  1. undefinedUnqualifiedVariableWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE
  2. undefinedUnqualifiedVariableIsNullExprWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE
  3. undefinedUnqualifiedVariableIsMissingExprWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE
  4. inPredicateWithTableConstructor, compileOption: PERMISSIVE
  5. notInPredicateWithTableConstructor, compileOption: PERMISSIVE

Now IGNORED Tests ❌

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Now Passing Tests

180 test(s) were previously failing in BASE (LEGACY-V0.14.8) but now pass in TARGET (EVAL-C08A6E9). Before merging, confirm they are intended to pass.

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

CROSS-COMMIT-REPORT ✅

BASE (EVAL-158B814) TARGET (EVAL-C08A6E9) +/-
% Passing 94.32% 94.32% 0.00% ✅
Passing 5561 5561 0 ✅
Failing 54 54 0 ✅
Ignored 281 281 0 ✅
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: 158b814
  • Base Engine: EVAL
  • Target Commit: c08a6e9
  • Target Engine: EVAL

Result Details

  • Passing in both: 5561
  • Failing in both: 54
  • Ignored in both: 281
  • PASSING in BASE but now FAILING in TARGET: 0
  • PASSING in BASE but now IGNORED in TARGET: 0
  • FAILING in BASE but now PASSING in TARGET: 0
  • IGNORED in BASE but now PASSING in TARGET: 0

Comment on lines -73 to -76
// TODO remove `NULL` and `MISSING` variants from DataType
// <absent types>
public static final int NULL = 1;
public static final int MISSING = 2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) removed NULL and MISSING from DataType. IS [NOT] NULL|MISSING handled by different node than ExprIsType.

Rest of changes in file are to fix the numbering.

*/
@Builder(builderClassName = "Builder")
@EqualsAndHashCode(callSuper = false)
public class ExprBoolTest extends Expr {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) there are a few different ways to model bool test predicate and null/missing predicate. Open to some other suggestions on modeling if it could be improved. Some alternatives I considered

  • separate nodes for all (i.e. ExprIsTrue, ExprIsFalse, ExprIsUnknown, ExprIsNull, ExprIsMissing)? <- seemed a bit redundant w/ all the visitors + rewriter methods that would get added
  • modeling IS [NOT] NULL and IS [NOT] MISSING together as an absent node? <- creates the same number of classes (i.e. ExprIsAbsent and AbsentEnum) as just having them separated out. I feel like we won't be adding more absent predicates? But if we will, then an enum might be better.

public static final int UNKNOWN = 0;
public static final int TRUE = 1;
public static final int FALSE = 2;
public static final int UNK = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Maybe OTHER, or maybe we don't need to define the UNK variant for this. It would still force non-exhaustive and we have -1 reserved. Maybe you make it

// implicit uknown is 0
TRUE = 1
FALSE = 2
UNKNOWN = 3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't too sure what to do w/ the name collision. Ideally for the sake of consistency, I would like to keep the unknown/other variant the same across AstEnum's.

Perhaps we could rename all of the unknown/other variants something that won't have a collision like _UNKNOWN or _OTHER? The static functions could then be named _UNKNOWN()/_OTHER().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I have a preference on the exact name so let's check in with John on Tuesday's meeting. I think the important thing is avoiding current and future collisions with what you have suggested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've been thinking about this. If there is good separation between our library APIs and serialization, then I believe we can remove the UNKNOWN variants of the other enums.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's discuss this tomorrow. I'll need to check w/ the original rationale for including an UNKNOWN variant for all of the enums. Perhaps it's required for serde? But we could always add the variant and relevant functions back in the future.

Comment on lines +1047 to +1049
input = "'foo' IS TRUE",
expected = Datum.missing(),
mode = Mode.PERMISSIVE()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something interesting is Postgres does

'1' IS TRUE  -- true
'foo' IS TRUE -- err!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in strict mode, 'foo' IS TRUE should error (assuming #1680 gets fixed) since the expression for IS <truth value> expects a boolean value.

This is also consistent w/ what we do for the other boolean constructs like AND, OR, NOT (i.e. data type mismatch error in strict, missing in permissive).

Postgres allows certain strings as input to functions that expect booleans. From their docs:

The datatype input function for type boolean accepts these string representations for the “true” state:

  • true
  • yes
  • on
  • 1

and these representations for the “false” state:

  • false
  • no
  • off
  • 0

Unique prefixes of these strings are also accepted, for example t or n. Leading or trailing whitespace is ignored, and case does not matter.

-- in Postgres
-- IS TRUE
SELECT 'no' IS TRUE -- false
SELECT 'foo' IS TRUE -- Query Error: invalid input syntax for type boolean: "foo"
-- IS FALSE
SELECT 'no' IS FALSE -- true
SELECT 'foo' IS FALSE -- Query Error: invalid input syntax for type boolean: "foo"
-- IS UNKNOWN
SELECT 'no' IS UNKNOWN -- false
SELECT 'foo' IS UNKNOWN -- Query Error: invalid input syntax for type boolean: "foo"
-- other boolean exprs
SELECT NOT 'no' -- true
SELECT NOT 'foo' -- Query Error: invalid input syntax for type boolean: "foo"
SELECT true AND 'no' -- false
SELECT true AND 'foo' -- Query Error: invalid input syntax for type boolean: "foo"
SELECT false OR 'no' -- false
SELECT false OR 'foo' -- Query Error: invalid input syntax for type boolean: "foo"

Copy link
Contributor

@rchowell rchowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving AS-IS to keep you moving forward. I've created a follow-up for tues. #1682

@alancai98 alancai98 merged commit b8100f5 into main Dec 16, 2024
14 checks passed
@alancai98 alancai98 deleted the add-bool-test branch December 16, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants